Overview
Brought to you by YData
Dataset statistics
| Number of variables | 4 |
|---|---|
| Number of observations | 99441 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.0 MiB |
| Average record size in memory | 32.0 B |
Variable types
| Text | 2 |
|---|---|
| Numeric | 1 |
| Categorical | 1 |
customer_state is highly overall correlated with customer_zip_code_prefix | High correlation |
customer_zip_code_prefix is highly overall correlated with customer_state | High correlation |
customer_id has unique values | Unique |
Reproduction
| Analysis started | 2024-11-21 11:10:40.697541 |
|---|---|
| Analysis finished | 2024-11-21 11:14:02.494630 |
| Duration | 3 minutes and 21.8 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
customer_id
Text
Unique 
| Distinct | 99441 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 777.0 KiB |
Length
| Max length | 32 |
|---|---|
| Median length | 32 |
| Mean length | 32 |
| Min length | 32 |
Unique
| Unique | 99441 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 06b8999e2fba1a1fbc88172c00ba8bc7 |
|---|---|
| 2nd row | 18955e83d337fd6b2def6b18a428ac77 |
| 3rd row | 4e7b3e00288586ebd08712fdd0374a03 |
| 4th row | b2b6027bc5c5109e529d4dc6358b12c3 |
| 5th row | 4f2d8ab171c80ec8364f7c12e35b23ad |
| Value | Count | Frequency (%) |
| 06b8999e2fba1a1fbc88172c00ba8bc7 | 1 | < 0.1% |
| 4d27341acd30a36bca39008ee9bb9050 | 1 | < 0.1% |
| b2b6027bc5c5109e529d4dc6358b12c3 | 1 | < 0.1% |
| 4f2d8ab171c80ec8364f7c12e35b23ad | 1 | < 0.1% |
| 879864dab9bc3047522c92c82e1212b8 | 1 | < 0.1% |
| fd826e7cf63160e536e0908c76c3f441 | 1 | < 0.1% |
| 5e274e7a0c3809e14aba7ad5aae0d407 | 1 | < 0.1% |
| 5adf08e34b2e993982a47070956c5c65 | 1 | < 0.1% |
| 4b7139f34592b3a31687243a302fa75b | 1 | < 0.1% |
| 9fb35e4ed6f0a14a4977cd9aea4042bb | 1 | < 0.1% |
| Other values (99431) | 99431 |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 199366 | 6.3% |
| f | 199255 | 6.3% |
| 2 | 199235 | 6.3% |
| c | 199193 | 6.3% |
| 1 | 199150 | 6.3% |
| b | 199137 | 6.3% |
| 8 | 199094 | 6.3% |
| 3 | 199061 | 6.3% |
| 7 | 198923 | 6.3% |
| 6 | 198760 | 6.2% |
| Other values (6) | 1190938 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1988533 | |
| Lowercase Letter | 1193579 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 199366 | |
| 2 | 199235 | |
| 1 | 199150 | |
| 8 | 199094 | |
| 3 | 199061 | |
| 7 | 198923 | |
| 6 | 198760 | |
| 9 | 198689 | |
| 0 | 198310 | |
| 4 | 197945 |
Lowercase Letter
| Value | Count | Frequency (%) |
| f | 199255 | |
| c | 199193 | |
| b | 199137 | |
| e | 198713 | |
| a | 198646 | |
| d | 198635 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1988533 | |
| Latin | 1193579 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 199366 | |
| 2 | 199235 | |
| 1 | 199150 | |
| 8 | 199094 | |
| 3 | 199061 | |
| 7 | 198923 | |
| 6 | 198760 | |
| 9 | 198689 | |
| 0 | 198310 | |
| 4 | 197945 |
Latin
| Value | Count | Frequency (%) |
| f | 199255 | |
| c | 199193 | |
| b | 199137 | |
| e | 198713 | |
| a | 198646 | |
| d | 198635 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3182112 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 199366 | 6.3% |
| f | 199255 | 6.3% |
| 2 | 199235 | 6.3% |
| c | 199193 | 6.3% |
| 1 | 199150 | 6.3% |
| b | 199137 | 6.3% |
| 8 | 199094 | 6.3% |
| 3 | 199061 | 6.3% |
| 7 | 198923 | 6.3% |
| 6 | 198760 | 6.2% |
| Other values (6) | 1190938 |
customer_zip_code_prefix
Real number (ℝ)
High correlation 
| Distinct | 14994 |
|---|---|
| Distinct (%) | 15.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35137.475 |
| Minimum | 1003 |
|---|---|
| Maximum | 99990 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 777.0 KiB |
Quantile statistics
| Minimum | 1003 |
|---|---|
| 5-th percentile | 3315 |
| Q1 | 11347 |
| median | 24416 |
| Q3 | 58900 |
| 95-th percentile | 90550 |
| Maximum | 99990 |
| Range | 98987 |
| Interquartile range (IQR) | 47553 |
Descriptive statistics
| Standard deviation | 29797.939 |
|---|---|
| Coefficient of variation (CV) | 0.84803872 |
| Kurtosis | -0.78820393 |
| Mean | 35137.475 |
| Median Absolute Deviation (MAD) | 16386 |
| Skewness | 0.77902506 |
| Sum | 3.4941056 × 109 |
| Variance | 8.8791717 × 108 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 22790 | 142 | 0.1% |
| 24220 | 124 | 0.1% |
| 22793 | 121 | 0.1% |
| 24230 | 117 | 0.1% |
| 22775 | 110 | 0.1% |
| 29101 | 101 | 0.1% |
| 13212 | 95 | 0.1% |
| 35162 | 93 | 0.1% |
| 22631 | 89 | 0.1% |
| 38400 | 87 | 0.1% |
| Other values (14984) | 98362 |
| Value | Count | Frequency (%) |
| 1003 | 1 | < 0.1% |
| 1004 | 2 | < 0.1% |
| 1005 | 6 | |
| 1006 | 2 | < 0.1% |
| 1007 | 4 | |
| 1008 | 4 | |
| 1009 | 7 | |
| 1011 | 5 | |
| 1012 | 3 | |
| 1013 | 3 |
| Value | Count | Frequency (%) |
| 99990 | 1 | < 0.1% |
| 99980 | 2 | < 0.1% |
| 99970 | 1 | < 0.1% |
| 99965 | 2 | < 0.1% |
| 99960 | 2 | < 0.1% |
| 99955 | 3 | < 0.1% |
| 99950 | 9 | |
| 99940 | 2 | < 0.1% |
| 99930 | 5 | |
| 99925 | 1 | < 0.1% |
customer_city
Text
| Distinct | 4119 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 777.0 KiB |
Length
| Max length | 32 |
|---|---|
| Median length | 27 |
| Mean length | 10.344466 |
| Min length | 3 |
Unique
| Unique | 1144 ? |
|---|---|
| Unique (%) | 1.2% |
Sample
| 1st row | franca |
|---|---|
| 2nd row | sao bernardo do campo |
| 3rd row | sao paulo |
| 4th row | mogi das cruzes |
| 5th row | campinas |
| Value | Count | Frequency (%) |
| sao | 21050 | 12.1% |
| paulo | 15606 | 9.0% |
| de | 9684 | 5.6% |
| rio | 8278 | 4.7% |
| janeiro | 6882 | 3.9% |
| do | 4276 | 2.5% |
| belo | 2833 | 1.6% |
| horizonte | 2798 | 1.6% |
| brasilia | 2140 | 1.2% |
| porto | 1648 | 0.9% |
| Other values (3285) | 99118 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 169618 | |
| o | 126534 | |
| i | 78754 | 7.7% |
| r | 76497 | 7.4% |
| 74872 | 7.3% | |
| e | 67028 | 6.5% |
| s | 62903 | 6.1% |
| n | 45721 | 4.4% |
| u | 44917 | 4.4% |
| l | 44815 | 4.4% |
| Other values (21) | 237005 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 953332 | |
| Space Separator | 74872 | 7.3% |
| Dash Punctuation | 232 | < 0.1% |
| Other Punctuation | 226 | < 0.1% |
| Decimal Number | 2 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 169618 | |
| o | 126534 | |
| i | 78754 | 8.3% |
| r | 76497 | 8.0% |
| e | 67028 | 7.0% |
| s | 62903 | 6.6% |
| n | 45721 | 4.8% |
| u | 44917 | 4.7% |
| l | 44815 | 4.7% |
| p | 37119 | 3.9% |
| Other values (16) | 199426 |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 74872 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 232 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 226 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 953332 | |
| Common | 75332 | 7.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 169618 | |
| o | 126534 | |
| i | 78754 | 8.3% |
| r | 76497 | 8.0% |
| e | 67028 | 7.0% |
| s | 62903 | 6.6% |
| n | 45721 | 4.8% |
| u | 44917 | 4.7% |
| l | 44815 | 4.7% |
| p | 37119 | 3.9% |
| Other values (16) | 199426 |
Common
| Value | Count | Frequency (%) |
| 74872 | ||
| - | 232 | 0.3% |
| ' | 226 | 0.3% |
| 1 | 1 | < 0.1% |
| 4 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1028664 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 169618 | |
| o | 126534 | |
| i | 78754 | 7.7% |
| r | 76497 | 7.4% |
| 74872 | 7.3% | |
| e | 67028 | 6.5% |
| s | 62903 | 6.1% |
| n | 45721 | 4.4% |
| u | 44917 | 4.4% |
| l | 44815 | 4.4% |
| Other values (21) | 237005 |
customer_state
Categorical
High correlation 
| Distinct | 27 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 777.0 KiB |
| SP | |
|---|---|
| RJ | |
| MG | |
| RS | |
| PR | |
| Other values (22) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | SP |
|---|---|
| 2nd row | SP |
| 3rd row | SP |
| 4th row | SP |
| 5th row | SP |
Common Values
| Value | Count | Frequency (%) |
| SP | 41746 | |
| RJ | 12852 | 12.9% |
| MG | 11635 | 11.7% |
| RS | 5466 | 5.5% |
| PR | 5045 | 5.1% |
| SC | 3637 | 3.7% |
| BA | 3380 | 3.4% |
| DF | 2140 | 2.2% |
| ES | 2033 | 2.0% |
| GO | 2020 | 2.0% |
| Other values (17) | 9487 | 9.5% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| sp | 41746 | |
| rj | 12852 | 12.9% |
| mg | 11635 | 11.7% |
| rs | 5466 | 5.5% |
| pr | 5045 | 5.1% |
| sc | 3637 | 3.7% |
| ba | 3380 | 3.4% |
| df | 2140 | 2.2% |
| es | 2033 | 2.0% |
| go | 2020 | 2.0% |
| Other values (17) | 9487 | 9.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 53947 | |
| P | 50517 | |
| R | 24193 | |
| M | 14152 | 7.1% |
| G | 13655 | 6.9% |
| J | 12852 | 6.5% |
| A | 5812 | 2.9% |
| E | 5371 | 2.7% |
| C | 5054 | 2.5% |
| B | 3916 | 2.0% |
| Other values (7) | 9413 | 4.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 198882 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 53947 | |
| P | 50517 | |
| R | 24193 | |
| M | 14152 | 7.1% |
| G | 13655 | 6.9% |
| J | 12852 | 6.5% |
| A | 5812 | 2.9% |
| E | 5371 | 2.7% |
| C | 5054 | 2.5% |
| B | 3916 | 2.0% |
| Other values (7) | 9413 | 4.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 198882 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 53947 | |
| P | 50517 | |
| R | 24193 | |
| M | 14152 | 7.1% |
| G | 13655 | 6.9% |
| J | 12852 | 6.5% |
| A | 5812 | 2.9% |
| E | 5371 | 2.7% |
| C | 5054 | 2.5% |
| B | 3916 | 2.0% |
| Other values (7) | 9413 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 198882 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 53947 | |
| P | 50517 | |
| R | 24193 | |
| M | 14152 | 7.1% |
| G | 13655 | 6.9% |
| J | 12852 | 6.5% |
| A | 5812 | 2.9% |
| E | 5371 | 2.7% |
| C | 5054 | 2.5% |
| B | 3916 | 2.0% |
| Other values (7) | 9413 | 4.7% |
Interactions
Correlations
| customer_state | customer_zip_code_prefix | |
|---|---|---|
| customer_state | 1.000 | 0.922 |
| customer_zip_code_prefix | 0.922 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| customer_id | customer_zip_code_prefix | customer_city | customer_state | |
|---|---|---|---|---|
| 0 | 06b8999e2fba1a1fbc88172c00ba8bc7 | 14409 | franca | SP |
| 1 | 18955e83d337fd6b2def6b18a428ac77 | 9790 | sao bernardo do campo | SP |
| 2 | 4e7b3e00288586ebd08712fdd0374a03 | 1151 | sao paulo | SP |
| 3 | b2b6027bc5c5109e529d4dc6358b12c3 | 8775 | mogi das cruzes | SP |
| 4 | 4f2d8ab171c80ec8364f7c12e35b23ad | 13056 | campinas | SP |
| 5 | 879864dab9bc3047522c92c82e1212b8 | 89254 | jaragua do sul | SC |
| 6 | fd826e7cf63160e536e0908c76c3f441 | 4534 | sao paulo | SP |
| 7 | 5e274e7a0c3809e14aba7ad5aae0d407 | 35182 | timoteo | MG |
| 8 | 5adf08e34b2e993982a47070956c5c65 | 81560 | curitiba | PR |
| 9 | 4b7139f34592b3a31687243a302fa75b | 30575 | belo horizonte | MG |
| customer_id | customer_zip_code_prefix | customer_city | customer_state | |
|---|---|---|---|---|
| 99431 | be842c57a8c5a62e9585dd72f22b6338 | 99150 | marau | RS |
| 99432 | f255d679c7c86c24ef4861320d5b7675 | 13500 | rio claro | SP |
| 99433 | 14308d2303a3e2bdf4939b86c46d2679 | 66033 | belem | PA |
| 99434 | f5a0b560f9e9427792a88bec97710212 | 7790 | cajamar | SP |
| 99435 | 7fe2e80252a9ea476f950ae8f85b0f8f | 35500 | divinopolis | MG |
| 99436 | 17ddf5dd5d51696bb3d7c6291687be6f | 3937 | sao paulo | SP |
| 99437 | e7b71a9017aa05c9a7fd292d714858e8 | 6764 | taboao da serra | SP |
| 99438 | 5e28dfe12db7fb50a4b2f691faecea5e | 60115 | fortaleza | CE |
| 99439 | 56b18e2166679b8a959d72dd06da27f9 | 92120 | canoas | RS |
| 99440 | 274fa6071e5e17fe303b9748641082c8 | 6703 | cotia | SP |